Toward Improving the Automated Classification of Metonymy in Text Corpora

نویسنده

  • Francis M. O. Ferraro
چکیده

In this paper, we explore methods for improving the automatic classification of schematic metonymies in corpus-based text. Using a pre-existing dataset of thousands of samples, we formulate the hypothesis that a better modeling of the underlying syntactic, semantic and conceptual meanings within a document (sample) can aid automated metonymy classification. To test this hypothesis, we build upon previous researchers’ work on metonymy resolution systems but also introduce novel features, including, but not limited to, the extraction and analysis of conceptual paths between syntactically and semantically connected words. We initially explore three models for this classification task, but settle for final evaluation on two of them. Purely quantitatively, the results indicate that more work needs to be done, but on a more detailed analysis, we explain the strengths of our ideas behind our methods, and the weaknesses of the databases used for feature extraction. Finally, we present ways that this work can be continued and expanded. ∗This work was completed in partial fulfillment of the requirements for an Honors Bachelor of Science Degree in Computer Science from the Department of Computer Science at the University of Rochester, in Rochester, NY, USA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Improving Automated Alignment in Multilingual Corpora

We report on methods of improving multilingual text alignments that have been produced in a simple dynamic-programming scheme, by automated detection of possible misalignments. Details of methods involving cognates, speciallyidentified words, and propositional contents of sentences are given, together with notable features of their performance on parallel corpora in a number of different types ...

متن کامل

Combining Collocations, Lexical and Encyclopedic Knowledge for Metonymy Resolution

This paper presents a supervised method for resolving metonymies. We enhance a commonly used feature set with features extracted based on collocation information from corpora, generalized using lexical and encyclopedic knowledge to determine the preferred sense of the potentially metonymic word using methods from unsupervised word sense disambiguation. The methodology developed addresses one is...

متن کامل

Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora

We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011